Consistent Binary Classification with Generalized Performance Metrics

نویسندگان

Oluwasanmi Koyejo

Nagarajan Natarajan

Pradeep Ravikumar

Inderjit S. Dhillon

چکیده

Performance metrics for binary classification are designed to capture tradeoffs between four fundamental population quantities: true positives, false positives, true negatives and false negatives. Despite significant interest from theoretical and applied communities, little is known about either optimal classifiers or consistent algorithms for optimizing binary classification performance metrics beyond a few special cases. We consider a fairly large family of performance metrics given by ratios of linear combinations of the four fundamental population quantities. This family includes many well known binary classification metrics such as classification accuracy, AM measure, F-measure and the Jaccard similarity coefficient as special cases. Our analysis identifies the optimal classifiers as the sign of the thresholded conditional probability of the positive class, with a performance metric-dependent threshold. The optimal threshold can be constructed using simple plug-in estimators when the performance metric is a linear combination of the population quantities, but alternative techniques are required for the general case. We propose two algorithms for estimating the optimal classifiers, and prove their statistical consistency. Both algorithms are straightforward modifications of standard approaches to address the key challenge of optimal threshold selection, thus are simple to implement in practice. The first algorithm combines a plug-in estimate of the conditional probability of the positive class with optimal threshold selection. The second algorithm leverages recent work on calibrated asymmetric surrogate losses to construct candidate classifiers. We present empirical comparisons between these algorithms on benchmark datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Decision-Theoretic Classification Using Non-Decomposable Performance Metrics

We provide a general theoretical analysis of expected out-of-sample utility, also referred to as decisiontheoretic classification, for non-decomposable binary classification metrics such as F-measure and Jaccard coefficient. Our key result is that the expected out-of-sample utility for many performance metrics is provably optimized by a classifier which is equivalent to a signed thresholding of...

متن کامل

An Overview of General Performance Metrics of Binary Classifier Systems

The purpose of this document is to provide a brief overview of different metrics and terminology that is used to measure the performance of binary classification systems.

متن کامل

Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

We study consistency of learning algorithms for a multi-class performance metric that is anon-decomposable function of the confusion matrix of a classifier and cannot be expressed asa sum of losses on individual data points; examples of such performance metrics include themicro and macro F-measure used widely in information retrieval and the multi-class G-meanmetric popular in c...

متن کامل

Generalized Douglas-Weyl Finsler Metrics

In this paper, we study generalized Douglas-Weyl Finsler metrics. We find some conditions under which the class of generalized Douglas-Weyl (&alpha, &beta)-metric with vanishing S-curvature reduce to the class of Berwald metrics.

متن کامل

An adaptive estimation method to predict thermal comfort indices man using car classification neural deep belief

Human thermal comfort and discomfort of many experimental and theoretical indices are calculated using the input data the indicator of climatic elements are such as wind speed, temperature, humidity, solar radiation, etc. The daily data of temperature، wind speed، relative humidity، and cloudiness between the years 1382-1392 were used. In the First step، Tmrt parameter was calculated in the Ray...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Consistent Binary Classification with Generalized Performance Metrics

نویسندگان

چکیده

منابع مشابه

Optimal Decision-Theoretic Classification Using Non-Decomposable Performance Metrics

An Overview of General Performance Metrics of Binary Classifier Systems

Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

Generalized Douglas-Weyl Finsler Metrics

An adaptive estimation method to predict thermal comfort indices man using car classification neural deep belief

عنوان ژورنال:

اشتراک گذاری